Speaker-independent connected letter recognition with a multi-state time delay neural network
نویسندگان
چکیده
We present a Multi-State Time Del ay Neural Network (MS-TDNN) for speaker-i ndependent, connected l etter recogni ti on. Our MS-TDNNachi eves 98. 5/92.0% word accuracy on speaker dependent/i ndependent Engl i sh l etter tasks[7, 8]. In thi s paper we wi l l summari ze several techni ques to improve (a) conti nuous recogni ti on performance, such as sentence l evel trai ni ng, and (b) phoneti c model i ng, such as network archi tectures wi th \i nternal speaker model s", al l owing for \tuni ng-i n" to newspeakWe al so present resul ts on our l arge and sti l l growing GermanLetter data base, contai ni ng over 40. 000 l etnti nuousl y spel l ed by 55 speakers. European Conference on Speech, Communication and Technology (EUROSPEECH 93), Berl in, Germany, Vol ume 2, pp. 1481-1484 el l ed Letter Recogni ti on, Speaker-
منابع مشابه
Connected Letter Recognition with a Multi-State Time Delay Neural Network
The Multi-State Time Delay Neural Network (MS-TDNN) integrates a nonlinear time alignment procedure (DTW) and the highaccuracy phoneme spotting capabilities of a TDNN into a connectionist speech recognition system with word-level classification and error backpropagation. We present an MS-TDNN for recognizing continuously spelled letters, a task characterized by a small but highly confusable voc...
متن کاملMulti-State Time Delay Neural Networks for Continuous Speech Recognition
Alex Waibel Carnegie Mellon University Pittsburgh, PA 15213 [email protected] We present the "Multi-State Time Delay Neural Network" (MS-TDNN) as an extension of the TDNN to robust word recognition. Unlike most other hybrid methods. the MS-TDNN embeds an alignment search procedure into the connectionist architecture. and allows for word level supervision. The resulting system has the ability to ma...
متن کاملConnectionist Architectures for Multi-Speaker Phoneme Recognition
We present a number of Time-Delay Neural Network (TDNN) based architectures for multi-speaker phoneme recognition (/b,d,g/ task). We use speech of two females and four males to compare the performance of the various architectures against a baseline recognition rate of 95.9% for a single IDNN on the six-speaker /b,d,g/ task. This series of modular designs leads to a highly modular multi-network ...
متن کاملشبکه عصبی پیچشی با پنجرههای قابل تطبیق برای بازشناسی گفتار
Although, speech recognition systems are widely used and their accuracies are continuously increased, there is a considerable performance gap between their accuracies and human recognition ability. This is partially due to high speaker variations in speech signal. Deep neural networks are among the best tools for acoustic modeling. Recently, using hybrid deep neural network and hidden Markov mo...
متن کاملSpeaker Independent Vowel Recognition using Backpropagation Neural Network on Master-Slave Architecture
Objective of the work is speaker independent recognition of vowels of British English. Back propagation is one of the simplest and most widely used methods for supervised training of multi layer neural networks. In this paper we use parallel implementation of Backpropagation (BP) on Master – Slave architecture to recognize speaker independent eleven steady state vowels of British English. We pe...
متن کامل